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It is known that in multiprocessing systems composed of many identical 
processing units operating in parallel, certain timing anomalies may 
occur; e.g., an increase in the number of processing units can cause an 
increase in the total length of time needed to process a fixed set of tasks. 
In this paper, precise bounds are derived for several anomalies of this type. 

I. INTRODUCTION 

In recent years there has been increased interest in the study of the 
potential advantages afforded by (lie use of a computer with many 
processors in parallel. While it is generally true that a set of tasks may 
be processed in less time by this type of multiprocessing, it has been 
pointed out that certain anomalies' - 2 may occur, even though the proces- 
sors are used in a very "natural" way (e.g., it can happen that increasing 
the number of processors can increase the time required to complete a 
given set of tasks). 

It is the purpose of this paper to derive precise bounds on the extent 
to which these anomalies can affect the time required to process a set 
of tasks, given certain rather natural rules for the operation of the 
multiprocessing system. 

1.1 Description of the System 

Let us suppose that we are given n identical processing units P, , 
1 ^ i ^ n, and a set of tasks T = [ T\ , • ■ • , T m \ to be processed by 
the Pi . We are also given a partial-order* ■< on T and a function fi: 
T — > [0, °°). Once a processor P, begins a task Tj , it works without 
interruption on Tj until completion of that task, taking altogether 
n(Tj) units of time. It is also required that if 7\ -< Tj then Tj cannot 

* See Ref. 2. 
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be started until 7\ is completed. The Pi execute the Tj in the following 
way: We are given a linear ordering L: (T kl , • ■ ■ , T k J of T called a 
task list (or priority list). In general, at any time t a P, completes a 
task, it immediately (and instantaneously) scans the list L (starting 
from the beginning) until it comes to the first task Tj which has not 
yet begun to be executed. If all the predecessors of Tj (i.e., those Ti < Tj) 
have been completed by time t then P t begins working on Tj . Otherwise 
Pi proceeds to the next task Z 7 ,-/ > in L which has not yet begun to be 
executed, etc. If P, proceeds through the entire list L without finding 
a task to execute then P, becomes idle (we shall also say that P» is 
working on an empty task). P, remains idle until some other Pj com- 
pletes a task at which time Pi (and of course Pj) immediately scans the 
list L as before for possible tasks to execute. If two processors Pi and 
Pj , i < j, simultaneously attempt to begin the same task Tk , it will be 
our convention to assign T k to P, , the processor with the smaller index. 
The processors all start scanning L at time t = and proceed in the 
above-mentioned fashion until some time oj, the least time for which 
all the tasks have been completed. 

It will be helpful here to consider several examples. We shall indicate 
the partial-order < on T and the function n by a directed graph G(<,n). 
In G( < ,n), the vertices will correspond to the Tj and a directed edge 
from Ti to Tj will indicate that Ti < Tj . Each vertex of G( < ,n) will 
actually be labelled with the symbol Tj/n{Tj), the ju(Ty) indicating the 
time necessary to execute Tj . The activity of each P, is conveniently 
represented by a timing diagram Q (also known as a Gantt diagram; 
see Ref. 1). 9 will consist of n horizontal half-lines (labelled by the Pi) 
in which each line is subdivided into segments* and labelled according 
to the state of the corresponding processor. 

Example 1: n = 3, L: (T 3 , T lf T 2 , T 4 , T 6 , T 6 , T 7 , T s ) 

Ti/4 r 2 /3 




* We always consider the segments as being closed on the left and open on the 
right. 
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The symbol <p< indicates a processor is idle (i.e., working on the empty 
task ^i) but not all the Tj have been completed. The indexing of the ^, 
is arbitrary. Thus, for g we have w = 9. 



Example 2: n = 4, L: (T, , T 2 , T, , T, , T 6 ) 

T,/b 

G«,n): 'A/4 
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Here, u = 13. Note that in this example, a is independent of L. We 
should also point out here that we are using the convention that when- 
ever any Tj is completed, then all current empty tasks ipi are also termi- 
nated. Processors still idle are then given "new" empty tasks to com- 
plete (e.g., Pi in Example 2). 
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Example 3: n = 3, L: {T x , T 2 , T s , T 4 , T 6 , T 6 , Ti) 
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Suppose we use a different list 1/ given by L'\ {T x , T 2 , Ty , Ti , 
7 1 *, Ts, Ti). We then have 



ft 

9'= ft 

ft 
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Hence, by simply using a different list Z/, we have shortened co by nearly 
a factor of two. The significance of this and similar examples will be 
brought out in the next section. 
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We see that, in general, w is a function of the task list L, the "time" 
function p, the partial-order <, and the number of processors n (in 
addition to the rules under which the P, operate). In this note, we 
investigate the factor by which u can increase if we simultaneously : 
(i) Change* the task list L; 
(ii) Decrease the function n; 

(Hi) Relax the partial-order < ; 

(iv) Change the number of processors from n to n'. 

While it might first be expected that (ii), (Hi), or (iv) (with n' > n) 
would cause a decrease in o>, easy counterexamples! show that is not 
always the case. In the next section we obtain an upper bound on the 
factor by which to can increase because of (i), (ii), (Hi), and (iv) (cf. 
Theorem, p. 1571). This bound is just the expression 1 + n — \/n' . We 
also show that this bound is the best possible in the sense that it cannot 
be replaced by any smaller function of n and n'. 

II. THE MAIN RESULTS 

We begin this section by considering a special case of the general 
problem. We include this here in order to acquaint the reader with the 
basic ideas which will be used later. Suppose we are given a set of tasks 
T = [ T\ , • • • , T m ) and a directed graph G( < ,/x) giving a partial-order 
< and a time function /i on T. We execute these tasks twice, each time 
using two identical processors Pi and P 2 . The first time the tasks are 
executed we use a task list L while the second time the tasks are exe- 
cuted we use another task list U. Suppose the corresponding finishing 
times are co and «'. The question we consider now is this: How much 
can the ratio w'/w vary? This is answered by the following 

Proposition: # £ «; £ £ 

Proof: By the symmetry of u and a/ it suffices to show that co'/oj ^ -f . 
The basic idea we shall use is a simple one. Consider the timing diagram 
9 obtained when the tasks are executed using the list L. We want to 
show that there is a chain% of tasks T Cl < T C2 < • ■ ■ < T Cr which 
has the property that whenever a processor is idle (i.e., executing an 
empty task <pi) then the other processor is executing one of the T Ck . 

* By "change" we mean "possibly change", etc. 

t As far as the author is aware, these facts were first pointed out by Richards. 3 

j i.e., a linearly-ordered subset using the partial-order -< . 
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To define the T Ck we proceed as follows. First, let T h be defined to 
be the task which has the latest finishing time in 9 (if there is more 
than one such task then we choose the task which is executed by the 
higher-indexed processor). Let <p h be the empty task which has the 
latest finishing time of all those empty tasks which finish at a time not later 
than the starting time of T h . By the construction of g, there must be a 
task T u which has the same finishing time as <p tl . Define T J2 to be T u . 
In general, suppose we have defined T Jk for some k ^ 2. To define T jk+1 , 
let <p tk be the empty task which has the latest finishing time of all those 
empty tasks £>,• which finish at a time not later than the starting time 
of T jk . (If there are no such <pi then we are done, i.e., T jk+1 is not de- 
fined.) By hypothesis, there must be a task T v which has the same 
finishing time as <pt k+1 and which has a starting time not later than the 
starting time of <p tk+1 ■ Define T jk+1 to be T v . We continue this algo- 
rithm for as long as possible, say, until we have defined T$ x , • • • , Tj t . 

We first note that since no processor works on one empty task <pi 
while the other processor works on more than one task, then at any time 
a processor is executing an empty task, <pi , the other processor is executing 
one of the T jk . We next claim that T jk+1 < T jk for 1 ^ k < r. Suppose 
this is not the case. If t denotes the time at which a processor P, started 
executing <p tk+1 then by the hypothesis concerning the operation of the 
processors, P, should not have been idle (i.e., working on <p lk+1 ) since 
at least one task, namely T Jk , was eligible to be executed at that time. 
Thus, the timing diagram g is not valid and we have a contradiction. 
Hence, we must have T }k+1 < T jk for 1 ^ k < r. By defining T Ch m 
T, ,_ t for 1 ^ k ^ r, the first assertion is proved. It follows at once 
that if we let n{<pi) denote the length of time a processor spends executing 
<Pi , then 

E nipt) ^Em(^). (i) 

*<€8 k=1 

The proof of the proposition now follows directly. Let 7\-, < T it < 
• • • < Ti, be chosen (by the assertion just established) so that 






(2) 



where the <pi are taken from g' (the timing diagram obtained when the 
list U is used). Note that u/ can be written as: 

»' = h E M(ft) + E *(*'). (3) 

T k £T ♦,'68' 
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From (2) and (3) we have 

\T fc gr fc=i / 

Since the following inequalities hold: 

W ||E n(T k ) 

T k £T 



(4) 



(5) 



(6) 



(where (6) follows from the fact that 7\, < T ix < ■•• < T { ,), then we 
have from (4), (5), and (6) 

J <; J(2« + «) = y 

and the proposition follows. 

The following example shows that the upper bound of | cannot be 
replaced by any smaller value. 

Example 4: n = 2, L: (T t , T 3 , T 2 ), V ': (Ti , T 2 , T 3 ) 



(?(<,M): 



Ti/1 
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T 3 /2 



P, 
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T, 
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T, 



To 



f\ 



= 3 



Therefore, a>/a> = | and the upper bound of the proposition is achieved. 
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Before stating the main theorem we introduce some notation. Let 
T = { Ti , ■ ■ • , T m ) be a set of tasks. Let G = G(< ,/») and G' = G'( < V) 
be two directed graphs for T with the partial-orders < , < ' and the time 
functions /*,//. We say that G gj G' if: 

(i) M ' g n, i.e., p'(Tj) ^ v{T f ) for all Tj 6 F. 
iii) <' £ <, ie., F, <' F,- implies F< -< F, for all F< , F, € F. 
Finally, suppose we execute the tasks twice, one time using the graph 
G, a task list L and n processors, the other time using the graph G', a 
task list U and n' processors. Let w and a/ denote the respective finish- 
ing times. We then have the 



Theorem: If G' g G then 



co n 



Proof: By a slight modification of the argument used in the proposition, 
it follows that if <p/, 1 £ t £ v, denote the empty tasks of g' then there 
exists a chain of tasks T h <' T h <' • ■ <' T { , of tasks in F with 
the property that whenever a processor is idle then some other processor is 
executing one of the T ik . From this we conclude 

E „'(«') ^ (n'- 1)E/(W (7) 

As before we note that 

•'-•^E /(f,)+ E m'(*/)) 

where the inequality follows by (7). Since 

• *- E M(r,)fc± E n'dV) (9) 

71 r,-er W r,-er 



and 



(8) 



^Em(TJ ^Em'(^) (10) 
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then by (8), (9), and (10) we conclude 

a) ^ — (no) + (n — l)co). 
n 

Hence, 

co n 

and the theorem is proved. 

To show that this bound is best possible, we give several examples, 
which show that the bound can be attained (to within e) by varying 
any one of the four parameters L, n, < , or n. 

Example 5: L is varied. 

n = n', ft = /*', < = <'. 

Lt = \T\ , T< , • • • , i'„_l , To n _i , 2* M , I»+l , • • • , Tzn-i) 

L = \T\ , T„ , 1 „ + i , • • • , y'2,,-2 , T2 , T3 , • • • , T„_i , Ti„-i) 

.Ti/1 
• 7V1 



. 5P«-i/l 
t?(<,/i): .Tn/w - 1 
. T n +i/n - 1 

. Tm-i/n - 1 
. 2Wn 
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Thus, 



a) n 



which is the value of 1 + (n — l)/n' when n = »'. 

Example 6: \i is decreased. 

n = n', fi ^ p', < = <' 
L = U: (T u T t , ■■■ , T in ) 
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(InG, 

7\ < Ti < T 2n+1 
for 1 ^ i ^ n 
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Thus, 

cu' _ 2n — 1 + £ 
co n + '2e 

which is arbitrarily close to 2 — (I In) for £ sufficiently small. We 
should note the interesting fact that a/ ^ 2n — 1 + £ for any list 1/ 
which may be used. 



Example 7: < is relaxed. 

n = n', n = /x', 
L=U: (Ti,T 2 , 



G«,n): 
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Thus, 



/ 2n - 1 



a) n + e 
which is arbitrarily close to 2 — (1/n) for e sufficiently small. 
Example 8: n is varied. 
Case 1: n < n', p = /*', < = <' 

L = L': (Ti , T2 , • ■ • , T nn '- n ' +n+ 2) 
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Thus, 



CO 
CO 



n + n — 1 + e 



n' + 2e 

which is arbitrarily close to 1 + (n — 1/n') for e sufficiently small. 

Case 2: n > n' . The construction in this case is similar to that of Case 1 
and will not be presented. 

We should note that in Example 8 we took L = L'. If it is of some 
consolation to a possibly battered intuition, it should be noted that if 
n ^ n', n = n', and -< = < ' then for any L which is chosen, it is 
possible to choose a suitable L' for which to' ^ to. 

III. CONCLUDING REMARKS 

It should be pointed out here that we have not considered models of 
the multiprocessor system in which the priority list L is "dynamically 
formed" (as opposed to the fixed lists we have used thus far). For ex- 
ample, one seemingly quite reasonable way of doing this is as follows: 
At any time a processor is free, it immediately begins to execute the 
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"ready" task (i.e., one which has all its predecessors completed) which 
currently heads the longest chain of unexecuted tasks (including itself). 
Suppose by following this algorithm in choosing tasks, we have a finish- 
ing time of co*. If we denote by co the least possible finishing time 
(minimized over all lists), then we would like to assert something about 
the ratio co*/co . It follows from what has been proved in this paper 
that co*/ w ^ 2 — (1/n), (where n is the number of processors) and 
we would hope that, in fact, we could show co*/co is considerably closer 
to 1 than this. Unfortunately, this is not possible since it can be shown 
that the best possible bound on this ratio is given by 

^<2- " 



C0 o 71+1 

It is interesting to note, however, that in the case in which the partial- 
order ■< on the tasks is empty, then this bound can be improved* to 

co* < 4 _ 1 



OJo 



3 3n ' 



which, again, is best possible. 

In conclusion, one might ask just how "typical" the examples are for 
which co7co is close to the upper bound 2 — (1/n). While very little 
work has been done on this aspect, empirical results (using computer 
simulation (see Ref. 1)) indicate that examples in which «'/«<, ^ 1.1 
are quite common. 
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